Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We study the estimation of Zipfian distributions under 𝐿1 loss, and provide near minimax optimal bounds in several regimes. Specifically, we assume observations arrive from a known alphabet, and with a known decay rate parametrizing the Zipfian, but we do not know a priori which alphabet elements have larger probability than others. We present a novel Sort and Snap estimator, which uses the empirical proportions to sort the alphabet, and then snaps them to the associated term from the Zipfian distribution. For arbitrary decay rates and smaller alphabet sizes, as well as for large decay rates and large alphabet sizes, we show an exact or minor variant of this estimator is near minimax optimal and has exponential improvement over the standard empirical proportion estimator. However, for small decay rates and larger alphabet sizes a simulation study indicates the standard empirical proportion estimator is competitive with Sort and Snap procedures. In addition to providing nearly tight bounds for important high-dimensional estimation problems, we believe the Sort and Snap estimator, and its analysis, will have independent interest.more » « less
An official website of the United States government

Full Text Available